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This paper presents the first convergence result for random search algo- 
rithms to a subset of the Pareto set of given maximum size k with bounds 
on the approximation quality e. The core of the algorithm is a new selec- 
tion criterion based on a hypothetical multilevel grid on the objective space. 
It is shown that, when using this criterion for accepting new search points, 
the sequence of solution archives converges with probability one to a sub- 
set of the Pareto set that e-dominates the entire Pareto set. The obtained 
approximation quality e is equal to the size of the grid cells on the finest 
level of resolution that allows an approximation with at most k points in 
the family of grids considered. While the convergence result is of general 
theoretical interest, the archiving algorithm might be of high practical value 
for any type iterative multiobjective optimization method, such as evolu- 
tionary algorithms or other metaheuristics, which all rely on the usage of a 
finite on-line memory to store the best solutions found so far as the current 
approximation of the Pareto set. 



1 Introduction 

In multiobjective optimization, we are given m > 2 objective functions : X — > M, 
i S {!,..., m}, defined over some search space X, which might be implicitly defined by 
constraints. We assume the search space X to be finite and that, w.l.o.g., all objective 
shall be maximized. We are therefore interested in solving 

max{/(x) = (/i(x),...,/^(x))"^ |xG^} (1) 

Here, maximization is understood with respect to the product order > on M™, i.e., for 
any pair (y, y') G M"^ x M*", y > y' if and only if yi > y'- for all i € {1, . . . , m}. Hence 
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{y,>), where y = f{X) is called the objective space, is a partially ordered set. This 
gives rise to the following order relation on the search space. 

Definition 1 (Pareto dominance). For any pair {x,x') X x X, x is said to weakly 
dominate x' , denoted as x y x' , if and only if f(x) > f{x'). x said to dominate x' , 
denoted as x )^ x' , if and only if x >z x' and x' ^x. 

Note that (A:', ^) is a preordered set, while (A:', is a strictly partially ordered set. A 
subset A" C Af is called independent with respect to > if for all ^ X' x X' where 

X 7^ x' it holds that x ^ x'. Let the set of all such independent subsets of X be denoted 
by X, i.e., 

X := {X' Q X \ X' is independent with respect to >;}. 
The set of minimal elements of the objective space, 

y* := maDc{y, >) = {y e y \ ^y' ^ y with y' > y}, 

is called the Pareto front, and its preimage X* = f~^{y*) is called the Pareto set. 

Ideally, when solving (1), one is interested in determining the Pareto front y*, together 
with an independent set X' that should cover y* , i.e, f{X') = y*. This means we are 
usually not interested in obtaining more than one preimage for each element of the 
Pareto front. 

In many instances, the size of Pareto front might be immense, so we arc interested in 
approximations. Here, our goal is for find some subset of A" of a given maximum size k 
that approximates Y* well in the following sense. 

Definition 2 (e-dominance). Let e € M™, e > 0. For any pair (x, x') & X x X , x is said 
to e-dominate x' , denoted as x x' , if and only if f{x) + e > f{x'). 

A set ^ G A* is called an e-approximate Pareto set if for all x € A' there is an x' € .4 
with x' X. An e-approximate Pareto set that is a subset of X* is called an e-Pareto set. 
Thus, a reasonable task is to find an e-Pareto set of some given maximum cardinality k. 
Note that for the special case of e = 0, the notions of e-approximate Pareto set, e-Pareto 
set, and covering independent set are equivalent. 

We investigate in which sense simple random search is able to find an e-Pareto set of 
cardinality k of the multiobjective optimization problem (1), i.e., whether its solution 
set stochastically converges to sticIi a set in the limit. For this, we consider Algorithm 1, 
which is pure random search where the set At represents its archive of solutions, stored 
in a memory (array) of size at most k. 

2 Related work and previous results 

Many different notions of Pareto set approximations have been proposed in the literature, 
see for instance the survey on concepts of e-efficiency in [7]. However, many of them deal 
with infinite sets and are therefore not practical as a solution concept for our purpose 
of producing and maintaining a representative subset of a fixed given size. The use 
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Algorithm 1 Simple random search algorithm 

1: t^O 

2: A ^ 

3: loop 

4: 

5: Draw from X uniformly at random 
6: At update{At-i,xt) 

7: end loop 



of discrete e-approximations of the Pareto set was proposed almost simultaneously by 
various authors [6, 4, 15, 18]. The general idea is that each Pareto-optimal point is 
approximately dominated by some point of the approximation set. The e-dominance 
given in Definition 2 is a typical instance of this approach, while it is also common to 
use a relative deviation instead of the absolute deviation employed here. 

As relative deviation is essentially equivalent to absolute deviation on a logarithmically 
scaled objective space, this choice should not affect the convergence results obtained but 
rather depend on the actual application problem at hand. The nice property of relative 
deviation is that it allows to prove that, under very mild assumptions, there is always an 
e-Pareto set whose size is polynomial in the input length [13, 3]. Further approximation 
results for particular combinatorial multiobjective optimization problems are given in [2], 
where the question was how well a single solution can approximate the whole Pareto set, 
which is a special case of our question restricted to A; = 1 and with focus on deterministic 
algorithms. 

Despite the existence of suitable approximation concepts, investigations on the conver- 
gence of particular algorithms towards such approximation sets, that is, their ability to 
obtain a suitable Pareto set approximation in the limit, have remained rare. In [16, 17] 
the stochastic search procedure proposed by earlier by [14] was analyzed and proved 
to converge to an e-Pareto set with e = in case of a finite search space. Obviously, 
the solution set maintained by this algorithm might in the worst case grow as large as 
the Pareto set X* itself. Thus, a different version with bounded memory of at most k 
elements was proposed and shown to converge to some subset of X* of size at most k, 
but no guarantee about the approximation quality could be given. Similar results were 
obtained by [5] for continuous search spaces. 

One option to control the approximation quality under size restrictions is to define 
a quality indicator which maps each possible solution set to a real value that can then 
be used to decide on the inclusion of a new search point. Several algorithms have been 
proposed that implement this concept [20, 1]. In case that such a quality indicator 
fulfils certain monotonicity conditions, it can be used as a potential function in the 
convergence analysis. As shown in [8, 9], this entails convergence to a subset of the 
Pareto set as a local optimum of the quality indicator, but it remained open how such a 
local optimum relates to a guarantee on the approximation quality e. [9] also analyzed an 
adaptive grid archiving method proposed in [10] and proved that after finite time, even 
though the solution set itself might permanently oscillate, it will always represent an e- 
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approximation whose approximation quality depends on the granularity of the adaptive 
grid and on the number of allowed solutions. The results depend on the additional 
assumption that the grid boundaries converge after finite time, which is fulfilled in 
certain special cases. 

In [12], two archiving algorithms were proposed that provably maintain a finite-size 
e-approximation of all points ever generated during the search process. As an immediate 
corollary, these archiving strategies ensure convergence to a Pareto set approximation of 
given quality e. While the desired e is an input to the algorithm that can be specified 
beforehand, the resulting size of the approximation can be bounded as a function of e and 
the ranges of the objective space y. In [11] it was shown that in practise these bounds 
are sometimes not tight enough, which is a particular disadvantage when applied in a 
scenario where the maximum archive size k has to be specified beforehand. If information 
of the objective space ranges is available, the bounds can be used to find a valid value 
for e, but this choice often turns out overly conservative so that far less solutions are 
attained than would be possible with a memory of size k. In case where the objective 
ranges are unknown, a mechanism proposed in [12] can be used to systematically increase 
the value of e without losing the convergence properties, but this suffers from the same 
drawback of being overly conservative. Thus, it has remained open until now whether 
working with fixed size Pareto set approximations can guarantee convergence in the limit 
for arbitrary multiobjective optimization problems on finite search spaces, and at the 
same time guarantee a certain approximation quality. 

In this paper we settle this question positively by presenting an archiving scheme that 
enables Algorithm 1 to produce a sequence of solution sets that converges with probabil- 
ity one to an e-Pareto set of a certain quality. The algorithm represents a combination 
of the two complementary algorithms discussed above [9, 12], thus combining the advan- 
tages of both: it can be seen as a variant of the adaptive grid archiving method where a 
multilevel, fixed grid is used instead of an adaptive grid with moving boundaries, which 
is crucial to obtain convergence. It can also be seen as a proper implementation of the 
adaptation mechanism for the e values proposed in [12], which is crucial to limit the size 
of the solution set to at most k, but which is also able to reduce the value of e whenever 
possible. Finally, the algorithm can be seen as selection using a particular quality indica- 
tor [20], a notion that will be defined more precisely later on. However, instead of having 
to compute the actual indicator values, which might be computationally cumbersome, 
this indicator will only be used as a potential function in the analysis of the algorithm 
and never has to be computed. The actual comparison will be defined using very simple 
local rules that - and this is crucial - are in accordance with the quality indicator, which 
will be established via order homomorphisms. 

3 Stochastic convergence analysis 

The sequences of archives {At,t G No} generated by Algorithm 1 arc realizations of a 
discrete-time stochastic process defined on a probability space P). The sample 

space Q, can be defined as the infinite product with the (j-algebra T = 2^ being 
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its power set. The probability measure P is defined implicitly by Algorithm 1. The 
stochastic process is then the sequence of random variables {At,t G No}, where 

At-.n — At{uj)=LOt 

and At = At{u!). 

From Algorithm 1 is it clear that At-\-i only depends on At, so the process is a ho- 
mogenous finite Markov chain with state space X. Due to line 5 of the algorithm, all 
transition probabilities that change the state are equal to p = 

Let the transition graph of the Markov chain be denoted by G = {X, E). Clearly, its 
arcs E C. X X X are determined by the update function given in Algorithm 2 as 

E = {{A, A!) e X X X \ e X : A! = update(^, x)}. 

Our goal is to show that this transition graph, when ignoring loops, forms a directed 
acyclic graph, which immediately implies that absorption will take place with probability 
one in finite time. We then proceed to show that in all absorbing states the archives are 
e-Pareto sets, and finally give some guarantee of the approximation quality e obtained. 

Instead of working on E directly, however, we define a potential function 7, according 
to which the set of possible archives X can be linearly ordered. For this, some auxiliary 
notation is needed. 

Definition 3 (box index vector). The box index vector of a vector y G M"* at level 6 G Z 
is given by the value of the function 

p(b) . _^ r^m^ ^ (rW(yi), . . . ,r(^)(y^))"' 



where 



Definition 4 (box-dominance). Let b ^ TL. For any pair {x,x') ^ X x X, x is said 
to weakly box-dominate x' at level b, denoted as x C>5 x' , if and only if (3^^\f{x)) > 
f3^''\f{x')). X is said to be box-equal to x' , denoted as x ~b ^ ^' o,'>^d x' >h x. If 
X >b x' and x x' then x is said to box-dominate x' at level b, denoted as x \>bx' . 

Note that the relations >b form a family of order extensions of the dominance relation 
y. The accompanying equivalence relations ~b can be seen as a successive coarse- 
graining of approximate indifference between solutions. 

Lemma 1. If x >b x' then x >c x' for all c>b. 

Proof. We show for all components i E {1, . . . ,m} that if r^^\fi{x)) > r^'^\fi{x)) then 
'f^'^^^\fi{x)) > r^^^^\fi{x)). The lemma follows then by induction. Let d := fi{x) ■ 
-\- 1/2 and d' := fi{x') ■ 2^^ + 1/2. From the premise we can express d = p + 1/2 -|- g 
and d' = p + 1/2 + 1 — h iov some p G 7,, g > and h > 0. li p = 2k for some k € Z, 
then r(^+i)(d) > [2A;/2 + 1/2J = k> [2k /2 + 1 - d\ = r^'^+^^d'). If p = 2A; + 1 then 
r^''+^\d) > l{2k + l)/2 + 1/2J = A; + 1 > [(2A; + l)/2 + 1 - dj = r^''+^\d'). □ 
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Algorithm 2 update(.4, x) 

1: A' ^ Au {x} 

2: if 3a e A: ay X then 

3: return A 

4: else if max(^', >~) < k then 
5: return max(^', 
6: else 

7: /3 ^ mm{b G Z | 3{a, a') e A' x A' : a>b a' and a 7^ a'} 
8: yl" ^ {a e I 3a' e A' :a'>bO' and a' 7^ a} 
9: if a; € .4" then 

10: return A 

11: else 

12: Draw a uniformly at random from A" 
13: return A \ {a} U {x} 
14: end if 
15: end if 



Definition 5 (potential function). Let 

i=-b 

where 

Bb{A) = {x e X \ 3a e A: a>bx} 

and 

b = mm{b e Z | V(x, x') € X x X : x x'} 

The power series defining / converges as the Bb are subsets of X, which is finite. 
Moreover, b exists since it is possible to enclose the whole objective space f(X) by one 
box by choosing b large enough. 

The dominance relation on solutions can be used to define a natural preference relation 
on the set of independent sets X. 

Definition 6 (dominance of independent sets). Let {A, A') Ax A. The set A is said 
to weakly dominate A' , denoted as A^ A' , i/max(/(.4.U A'), >) = max(/(.4.), >). 

Lemma 2. / is an order homomorphism o/(^, □) into (M, >), i.e., if A ^ A' then 
I{A) > liA'). I is also an order homomorphism of {A,Zi) into (M, >), i.e., ifAz\A' 
then I (A) > I{A'). 

Proof. If ^ □ then the coefficients |;Bj(^)| in the power series of I{A) are uniformly 
not less than |Bi(^')| because Bi{A') C Bi{A) for all i. If additionally A! ^ A then 
there exists an a G .4 and a 6 G Z such that there is no a' G A! with a' >b Hence, 
Bb{A!) C ^^(.4), which implies that I{A) > I{A'). □ 
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The potential function / can bee seen as a quality indicator for independent sets. As 
an immediate corollary of Lemma 2, the >-relation on I{A) represents a comparison 
method that is 2-compatible and □-complete in the terminology of [21]. 

Lemma 3. // {At,At+i) e E and At 7^ A+i then /(A+i) > HAt). 

Proof. If (At,At+i) G E and At ^ At+i then x G At+i = update{At,x). There are 
two cases that can cause x to be included into the new archive (termination in line 5 
or line 13). For termination to occur in line 5, max(^', y) < k has to hold (line 4), 
and furthermore x € max(^t_|_i, by contradiction to the condition in line 2. Since 
At = max(^, >-) ^ X it follows that At+i = max(^', y ) □ At and thus, by Lemma 2, 
I{At+i) > I{At)- When termination occurs in line 13, max(^', >p) contains x but not 
a. Hence Bp{At+i) D Bi3{At), which implies, by Lemma 1, that Bb{At+i) 2 Bi,{At) for 
all h> (3. This implies I{At+i) > I{At), which completes the proof. □ 

Theorem 1. The Markov chain {At,t G No} is absorbing. 

Proof. Due to Lemma 3, the mapping / is an order homomorphism of G into a strict 
linear order. This implies that the transitive closure of G is a partially ordered set, and 
hence G is a directed acyclic graph. As any Markov chain on an directed acyclic graph 
is absorbing, the claim follows. □ 

Theorem 2. For any absorbing state A € max(G) it holds that A'^ X* . 

Proof. Assume that A 2 X* , so there is some a ^ A that is not in X* . Then there 
exists some x ^ X with x y a. Let A = update(^, x). Since x y A, the condition 
max(yl', < A; in hnc 4 holds because a ^ max(^', y). Thus, A = max(^', y) ^ A, 
which is a contradiction to the assumption. □ 

The next theorem shows that any absorbing state box-dominates the Pareto set at the 
lowest level possible with the least number of solutions necessary, while distributing the 
remaining solutions with maximum entropy over the nondominated boxes at the next 
lower level. 

Theorem 3. Let 

6 = mm{b eZ\3A<ZX* with \A\ < k and Bb{A) = X}, (2) 

denote the smallest level b on which it is possible to box-dominate the Pareto set with at 
most k solutions, 

s = mm{\A\ : BsiA) = X}, (3) 

denote the minimum size of such a set, and let 

£ = {X' CX* \ V(x, x') eX' xX' -.x ^5-1 x'} (4) 

denote a partitioning of the Pareto set into the boxes of the next smaller level. Then for 
any absorbing state A € max(G) it holds that Bs{A) = X and that 

\{X' e£ : X' nAj^$}\=mm{k,\S\}. (5) 
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Proof. Assume that there is some x G X* with x ^ Bs{A). If |^| < k than max(^', y 
) < k in line 4, in which case A cannot be absorbing. If |^| = then A" cannot be 
empty, as this would contradict the definition of /? in line 7. If x ^ A" then x will enter 
A, so A cannot be absorbing. If a; € A" then (3 > 5 due to Lemma 1, since x ^ Bs{^A) 
but X G Bp{A) by assumption. The definition of [3 in line 7 now implies that A! is an 
independent set of cardinality fc + 1 with respect to >5, hence A! serves as a witness for 
the fact that there is no A with |.4| < k and B},{A) = X, which is a contradiction to (2) 
and completes the proof that Bs{A) = X. To prove the second part of the proposition, 
assume that Bi{A) = and d := \{X' ^ E : X' r\ A ^\ < min{/c, j<S|}. Hence there 
is some x ^ X' ^ E with X' ^ A = %. If |„4| < k than x will be accepted as there is no 
a ^ A with a ^ x, so .4 cannot be absorbing. If |.4| = k then /3 < (5 as otherwise d = k. 
Now, as /9a G A with a >s-i ^ and (3 < 5 — 1, it follows with Lemma 1 that x A", 
thus X will be accepted, so A cannot be absorbing. □ 

We have collected now all necessary ingredients to show the main result. 

Theorem 4. The sequence {At,t G No} converges with probability one to some e-Pareto 
set with e = 2^ , with S defined as in Theorem 3. 

Proof. As a corollary to Theorems 1 and 2, At converges with probability one to some 
subset of the Pareto set. As a corollary of Theorem 3, for any absorbing state A there 
exists for all x ^ X* some a & A such that a >5 a; and, hence, a)^f^x for all e > 2'^. □ 

4 Conclusions 

In this paper, the first convergence result for random search algorithms to a subset of 
the Pareto set of given maximum size k and bounds on the approximation quality e was 
given. The convergence was enabled by a new selection scheme, given als Algorithm 2, 
that compares the new candidate solution to the current archive using a multi-level grid. 

In many parts, the assumption of a finite search space was used. Even though this 
is a reasonable assumption for any implementation in computer arithmetic with finite 
precision, an extension to the continuous case would be desirable. Even though it might 
be a justified assuption that the results can be extended, recent experience [19] has 
shown that this might involve considerable effort. 
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